Assessing the Relationship between Train Strike Trespasser Fatalities in California and Population Density
DSAN 6750 / PPOL 6805: GIS for Spatial Data Science
Author
Affiliation
Lindsay Strong
Georgetown University
Introduction
From 2012-2017, there were 3,687 railroad trespasser fatalities across the United States (Kidda et al. 2020). Previous studies have assessed the trends among trespesser strikes and emphasized that trespasser strikes are an urban problem opposed to a rural problem. I will assess the relationship between population density and trespasser strikes using spatial data science techniques.
As California has the most trespasser fatalities out of any US state, I will be limiting my analysis to California (Kidda et al. 2020). I will use trespasser strike data from the Department of Transportation which includes point data for the latitude and longitude of the strike.
By assessing the relationship between trespasser fatalities in California and population density, I hope to apply my findings to targeted interventions to prevent future trespasser strikes.
Literature Review
The Federal Rail Association (FRA) assessed trends in trespasser train strikes from 2012-2017. California, New York, Florida and Texas had the most trespasser strikes across U.S. states (Kidda et al. 2020). The FRA identified trends among suicides, train types, time of day, age, and individual’s action at the time of death (Kidda et al. 2020). This paper does not assess the relationship between population density and trespasser strikes.
Northwestern economics professor, Ian Savage, notes in his manuscript on Trespassing the Railroad in 2007 that trespasser strikes appear to be an urban problem opposed to a city one as “less than one quarter of fatalities occur outside of town or city limits” (Savage 2007).
Methodology
For population density, I used California census tract data from 2020 using tidycensus and divided the total population by the total area for each tract. For the hypothesis testing, I computed the centroid of these census tracts for my underlying density.
Exploratory Data Analysis (EDA)
The map below shows the 1,528 trespasser fatalities (in purple) California from 2011-2022.
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (14): the_geom, RAILROAD, INCDTNO, AGE, CAS57, AMPM, cnty10, rr, STATUS,...
dbl (5): YEAR4, TIMEHR, TIMEMIN, LATITUDE, LONGITUD
num (1): OBJECTID
lgl (1): COVERDATA
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Attaching package: 'dplyr'
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
By looking at only census tracts where fatalities have occured, we can see hotspots for strikes in Northern California, specifically around the Richmond and Berkeley area as well as around the Davis area and Modesto area. There appear to be fewer strikes in southern California but there appears to be a cluster around the Pomona and Ontario area.
In [2]:
library(tigris)
To enable caching of data, set `options(tigris_use_cache = TRUE)`
in your R script or .Rprofile.
library(tidycensus)
Warning: package 'tidycensus' was built under R version 4.3.3
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Your original .Renviron will be backed up and stored in your R HOME directory if needed.
Your API key has been stored in your .Renviron and can be accessed by Sys.getenv("CENSUS_API_KEY").
To use now, restart R or run `readRenviron("~/.Renviron")`
[1] "22de4cafe1259cf144997981fc5e359e41276773"
california <-get_acs(state ="CA",geography ="tract",variables =c("B01003_001", "B25043_001"),geometry =TRUE,year =2020)
Using Moran’s I for spatial autocorrelation, we can determine if the data is clustered and we can use Local Moran’s I to identify where these clusters lie. I used all of the census tracts across California, even those without strikes, in order to access clustering.
The result from Moran’s I test is displayed below.
In [3]:
library(spdep)
Warning: package 'spdep' was built under R version 4.3.3
Loading required package: spData
Warning: package 'spData' was built under R version 4.3.3
To access larger datasets in this package, install the spDataLarge
package with: `install.packages('spDataLarge',
repos='https://nowosad.github.io/drat/', type='source')`
Warning in spdep::poly2nb(california_tracts): some observations have no neighbours;
if this seems unexpected, try increasing the snap argument.
Warning in spdep::poly2nb(california_tracts): neighbour object has 5 sub-graphs;
if this sub-graph count seems unexpected, try increasing the snap argument.
A value of roughly 0.16 indicates a slight positive autocorrelation so we can conclude that nearby census tracts have slightly similar numbers of strikes.
Local Moran’s I
Local Moran’s I test identifies areas where strikes are clustered together. The results of Local Moran’s I test can be shown in the map below.
Warning in poly2nb(as(california_tracts$geometry, "Spatial"), queen = TRUE, : some observations have no neighbours;
if this seems unexpected, try increasing the snap argument.
Warning in poly2nb(as(california_tracts$geometry, "Spatial"), queen = TRUE, : neighbour object has 5 sub-graphs;
if this sub-graph count seems unexpected, try increasing the snap argument.